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(ST) Abstract A method of accessing information includes processing a query, searching a collection of 
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results, outputring a prose rendition of the query and outputting the subset of results. 
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% •• ' FIG. 9 is flow diagram of a database aliasing file 
veneration process used by the information access process of 

jFlG^V 2. 

• '"'V ' ~ 

' ' FIG ' 10 ' flow .diagram. of a query expansion process 

| used by the information access process of FIG. 2. 

DETAILED DESCRITPION 
^ J*J Referring to FIG. 1, a network configuration 2 for 
^ executing an information access process includes a user 
||omputer 4 connected via a link 6 to an Internet 8/ The link 
|f may, be a telephone line or some other connection to the 
||g^*et 8, such as a high speed Tl line. The network 
..configuration 2 further includes a link 10 from the Internet 8 
#o.a Cli6nt SYStem 12 ' The ^ient system 12 is a computer 

system having at least a central processing unit (CPU) 14 a 
«emdry (MEM) 16, and a link 18 connected to a storage device 
;^Q. The storage device 20 includes a database 21, which 
.contains information that a user may query. The client system 

-also .shown to include a link 22 connecting the client 
• system. 12 to a server 24, The server 24 includes at least a 
||« ,25 .and a memory 26. A plug-in 27 is shown resident in the 

■W* Y 26 ° f th€ ™ 24 • ^ PWln 27 is an application 
program, module that allows a web site code running on the 
.Client . system- 12 to execute an information access process 
Residing in the memory 26 of the server 24. The plug- in 27 
pyws the web site application to incorporate results 
-returned from the information access process while it is 
generating HTML for display to the user's browser (not shown) 
: HTML ..refers to Hypertext Markup Language and is the set of 

S ^ ols or c °^s inserted in a file intended for 
r^olay on a World Wide Web browser. The markup tells the Web 
browser how to display a Web page's words and images for the 

- 3- 
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§:£f f*:^ ; - The individual markup codes are referred to as elements 
£ {also ^referred to as tags). As is shown, the server 24 shares 
to the database 21 on the storage device 20 via a link 
2,8.. Other network configurations are possible. For example, 
a particular network configuration includes the server 24 
|: j|ar|nt.aining a local copy of the database 21. Another network 
f: conf iguration includes the Internet 8 connecting the client 
v system: 12 to the server 24. 

-Referring to FIG. 1A, a search process 30 residing on a 
•^computer system includes a user using a web-browser on a 
\ -computer connecting 32 to the Internet and accessing a client 
• "• system. Other embodiments include a direct connection from 
Kthfe • user • computer to the client system. The client system 
>Af?P.iays- 33 a- page on the web browser of the user and the user 
>. ."inputs' 34 a query in a query input box of the displayed page. 
t; ; V^Ke ; :- 'query is sent 35 to an information access process residing 
-a server for processing. The information access process 
^processes 36 the query and sends the results to the client 

system. The results are then displayed 37 to the user. 
* 7; .Referring to FIG, 2, an information access process 40 on 
varcpitputer system receives 42 a query by a user. The query 
J.^Y;}^ a word or multiple words, sentence fragments, a 
^;- ; S^pi^te sentence, and may contain punctuation. The query is 
a:;hp|riaMz:ed 44 as pretext. Normalization includes checking the 
;text.. : for spelling and proper separation. A. language lexicon 
/is/. ..also consulted during normalization. The language lexicon 
% specifies a large list of words along with their normalized 
|' -forms. The -normalized forms typically, include word stems 

that is, the suffixes are removed from the words. For 
|f. example, the word "computers" would have the normalized form 
"f^^piit-er'' with the plural suffix removed. 



4 ] 
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| • . • ' The normalized text is "parsed 46, converting the 

normalized text into fragments adapted for further processing. 
: fragments are. produced by annotating words as punitive keys 
| and values, according to a feature lexicon. The feature 
f: lexicon is a vocabulary, or book containing an alphabetical 
arrangement of the words in a language or of a considerable 
^Jim^fi-r of them, with the definition of each; a dictionary. 
>lfor v example, the feature lexicon may specify that the term 
^Compaq" is a potential value and that "CPU speed" is a 

oMenM-al. key, -Multiple annotations are possible. 
fS^fe^The fragments are inflated 48 by the context in which the 
^©xfc ^inputted by the user arrived, e.g., a previous query, if 

that was inputted and/or a content o.f a web page in which 
^teus-er text was entered. The inflation is preformed by 
' ^selectively merging 50 state information provided by a session 
ifs^rvfice with a meaning representation for the current query. 
|j.Th^;^s elective merging is configurable based on rules that 
j^|e<c%fy which pieces of state information from the session 
l^se^vice should be merged into the current meaning 
£^epr;es ent ati on and which pieces should be overridden or masked 
;iJb^^tie -current meaning representation. 

^^fez-The session service stores all of the "conversations" 
a : £hafr- -occur at any given moment during all of the user's 
■"session. ■ State information is stored in the session service 
. providing a method of balancing load with additional computer 

. configurations. Load balancing may send each user query to a 
\ ; v.dif f qrent configuration of the computer system. However, 
'j sihce query processing requires state information, storage of 

v station* information on the computer system will not be 
cpiupatible with load balancing. Hence, use of the session 

^serivice provides easy expansion by the addition of computer 
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systems, with load sharing among the systems to support more 
i- users. 

]P >v<>: The state information includes user specified constraints 
-v: ttiafc-were used in a previous query, along with a list of 
I features displayed by the process 40 and the web page 
^presented by the main server. The state information may 
v: optionally include a result set, either in its 'entirety or in 
r%'nM'nsed form, from the previous query to speed up subsequent 
^ztiee&sinq in context ♦ The session service may reside in one 
% ,GomF»iiter system, or include multiple computer systems. When 

computer systems are employed, the state information 
:4 may -be : assigned to a single computer system or replicated 
v'pcross more than one computer system. 

If, Referring now to FIG. 3, the inflated sentence fragments 
^'•"^^'■•converted 52 into meaning representation by making 
^multiple passes through a meaning resolution process 70. The 
meahd-ng resolution process 70 determines 72 if there is a 
], valid Interpretation within the text query of a key-value 
| grouping of the fragment. If there is a valid interpretation, 

• 'the key value grouping is used 74. For example, if the input 
,;'tex,t.,. i.e., inflated sentence fragment, contains the string 

ff ^"'^Q'^HHz CPU speed, * which may be parsed into two fragments, 
•" ; ^5:00 MHz" value and "CPU speed" key, then there is a valid 
. grouping of key = "CPU speed" and value = "500 MHz". 
,."V\" If • no valid interpretation exists, a determination 7 6 is 
fi made* oh whether the main database contains a valid 
| : in.te.rpretation. If there is a valid interpretation in the 
main database, the key value group is used 74. If no valid 

• interpretation is found in the main database, the process 70 
-determines 78 whether previous index fields have a high 

* : -edtif;idence of uniquely containing the fragment. If ' so, the 
. key value grouping is used 74. If not, other information 
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sources are searched 80 and a valid key value group generated 

.Tf a high confidence and valid punitive key is determined 
through one of the information sources consulted, then the 
grouping of the key and value form an atomic element are used 
74... To make it possible to override false interpretations, a 
:c ohf i gur at i on of grammar can also specify manual groupings of 
l^fceysy-vand values that take precedence over the meaning 
: r.^je^iution process 70. . 

Referring again to FIG. 2, meaning resolved fragments, 
P^priesenting the user query, are answered 54. In providing an 
^^s^er or answers, logic may decide whether or not to go out 
; •::t©ribtie main database, whether or hot to do a simple key word 
fc^; search, or whether or not to do direct navigation, and so 
p:^t.H- • Answer or answers are. summarized and organized 56. 
f$-\ S um&ar iz a t i o n and organization may involve intelligent 

M;scarding of excessive and unneeded details to provide more 
m^n^ngful results in response to the user query* 
:^;^iv..When a user asks a question, i.e., submits a query, there 
Jjs^£ually no way to predict how many appropriate results will 
l^^^ound. The process 40 attempts to present the user with no 
^.r;e information than can be reasonably absorbed. This is 
often dictated by the amount of space available on the users 
displayed web page. 

- Prose is generated 58, The prose represents the specific 
qufery the user initially asked, followed by organized and 
summarized results to the user query. The prose and organized 
answers are outputted 60 to the user - for display. Output to 
th^.-user may involve producing HTML of the prose and organized 
answers and/or XML for transmission of the organized answers 
and. dynamic prose back to the main server for HTML rendering, 
XML refers to extensive markup language, a flexible way to 
provide common information formats and share both the format 

- 7- 
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the data on the word wide web, intranets, and elsewhere, 
individual or group of individuals or companies that wants 
Xg, fh$re information in a consistent way can use XML. 

v .. Referring to PIG. 4, the control logic of process 40 
^npliades .an information interface 80. The purpose of the 
> i ^^ iaation interface 80 is to isolate the control logic from 
, : ,i:h|^e^ails of any given web site on the main server or other 
servers, e.g., how they store particular information. For 
f^^^3, different web sites will name things differently 
^?*? r store th ^gs differently. The information interface 80 
;:; prp^ides a standard format for both receiving information 
0F0 : .,and sending information to, the control logic of process 
t^S^ d nomali 2es the interface to various information 
^pprces- The information interface 80 includes an information 
: : ^trieval process 82, a database (db) aliasing process 84, a 
^W^ : F iver Process 86 and a storage process 88 . 

An exemplary illustration of a standard format used by 
n^^ to ™^ ion interface 80 is shown as follows: 

; features {features 

{feature 
ikey 1 product price 1 } 
{ feature 
;key 'product min age 1 } 

{feature 
:key 'product max age 1 } 

(feature 
:key 'product name' } 

{feature 
rkey 'sku'}} 



i constraints {or 



{and 



{feature 

:key 'product description 1 
: value {or 

{value 
: eq 

'fire trucks' 
'fire trucks r ] ) } } } 
;sprt {features 

(feature 

:key r product price r } 
{feature 
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:key 'product min age'} 
'' , • :_ {feature 

' tkey 'product max age'} 

:_ {feature 

s key 'product name'} 
, ; v :_ { feature 
'j^v :key *sku'})} 

The information interface 80 handles and formats both 
^iprdf and "soft" searches. A hard search typically involves 

specific query for information, while a soft search 
typically involves a very general query for information. For 
\ example, a hard search may be for the price .to be less than 

$500 where price is a known column in the database and 
••^0*vtaj.ns numeric values. A soft search for "fire engine" may 
{be interpreted by the IR engine to include occurrences of 
^■^ce.- truck" within textual descriptions. 

:||:':^- ; ^he. URL driver process 86 maintains a URL configuration 

• Tne URL configuration file stores every detail of a web 
•$8?§§ in compressed format. The compression collapses a set of 
•web.pages with the same basic template into one entry in the 
;URL .configuration file. By way of example, the following is a 
^gS$8P-e Portion of a URL configuration file entry: 

4f|§£*Cx" ; - /newcar/$Manufacturer/$Year/$Model/ 

&tf*r.iS\- •' keys: overview 

'. '. 1 newcar/$Manuf acturer/$Year/$Model/sa£ etyandreliability . 

:%:],■•/■>■•■- asp 

keys: safety reliability 
^%:v- The cUd aliasing process 84 handles multiple words that 
I'Sfl? to the same information. For example, the db aliasing 
^prdce-ss 84 will equate "laptop" and "notebook" computers and 
*pc" and "personal computer." 

The URL driver process 86 includes bi-directional search 
logic for interacting with the URL configuration file. In a 
JTf^r^ard" search direction, a specific query is received and 
.the. search logic searches the URL configuration file for a 
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best match or matches and assigns a score to the match or 
'matches, the score representing a relative degree of success 
in, the match. The score is determined by the number of keys 
\'in; ;#ie URL configuration entry that match the keys desired by 
the- current meaning representation of the query. More 
■matching keys will result in a higher score. 
-V in a "reverse" direction, the search logic contained 
within the URL driver process 86 responds to a query by 
• looking at the contents of the web page in which the user is 
^currently viewing and finds the answer to the new user query 
in combination with the features of the web page which the 
•user -is- viewing, along with a score of the match or matches. 
•Thus.,, the search logic of the URL driver process 86 looks at 
the/; current web page and connects current web page content 
f wrtii >current . user queries, thus deriving contacts from the 
' previous line of questioning. 

As described with reference to FIG. 2, the information 
-access process 40 contains control logic to provide answers to 
^u ; s;er'-s- query. The answers are summarized and organized. 
^I^cally, the results of a specific database search, i.e., 
user query, will identify many rows of results. These rows 
•will often result in more than one web page of displayed 
/.^results if the total result is taken into account. The 
;{lfifoirmation access process 40 reduces the number of rows of 
/answers in an iterative fashion. 

'■..■'0- ^Referring to FIG. 5, a reduction and summarization 
'process no determines 112 a count of the total number of 

results obtained from searching the main database. The 
'reduction and summarization process 110 determines 114 the 
-amount of available space on the web page for display of the 
'answers. A determination. 116 is made as to whether the number 

of results exceeds the available space on the web page, if 
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the .nuinber of results do not exceed the available space on the 
web-page the results are displayed 1.18 on the web page . if 
the,nurab.er of results exceeds available space on the web page, 
a. row of results is eliminated 120 to produce a subset of the 
over all results. The number of results contained within the 
subset is determined 122. The determination 116 of whether 
Me. -number of results contained within the subset exceeds 
^variable space on the- web page is executed. The reduction 
and, summarization process 110 continues until the number of 
res.uits do not exceed available display space on the web page, 
^: ; : ;p whe n a reduction of results is made, the reduction and 
-surraaarization process 110 has no prior knowledge of how it 
. will affect the total count, .i.e., how many rows of data will 
be. ^eliminated. Reductions may reduce the overall result 
-Count:, i.e., rows of result data, in different ways. Before 
any .reduction and summarization is displayed in tabular form 
^p : 0he user, the resultant data is placed in a hierarchical 

Structure based on its taxonomy. Some searches will 
.generate balanced trees, while others will generate unbalanced 
4f%s. Further, some, trees will need to be combined with 
Vther trees. To reduce the resultant data, the reduction and 
summarization process.. 110 looks at the lowest members of the 

•4fff* /:i ' e *' the leaves ' and first eliminates this resultant 
da ; |a^ : This results in eliminating one or more rows of data 

Pie overall count of resultant data. If the overall count 
/is, still too large, the reduction and summarization process 
-1,1 a. repeats itself and eliminates another set of leaves. 
■V, Eliminating rows (i.e., leaves) to generate a reduced 
result set of answers allows the reduction and summarization 
process. 110 to reduce, identical information but maintain 
characterization under identical information in the 
hierarchical tree structure. The identical rows representing 

- 11- 
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i^#^i/cal information can be collapsed. For example, if the 
S^t'^^Bafted row in the reduced result set contains specific 

^ricie' information, collapsing the eliminated row may generate 
* price ranges instead of individual prices. 
| ! ./ (v .As mentioned previously, some results may generate 
l|^l^ple trees- In a particular embodiment, to reduce the 
^.ov6r;aLHLl amount of resultant data in the result set, 
t|in format ion is eliminated where the greatest number of leaves 
■;,^e' present across multiple trees. 

■ jis ?Ref erring again to. FIG. 2, it should be noted that 
|^,gmpf^es the information access process 40 will provide no 
?|-.sp^^riz : ation and/or reduction of results, e.g., -the user asks 
itforSr^o summarization or the results are very small. 

Organization of resultant data generally puts the answers 
f -;tc>- : jtlae user's query into a hierarchy, like a table, for 

example, and the table may include links to other web pages 
^^f or'^ciisp 1 ay to the user. Links, i.e., addresses associated 
f t -wifeh/ each row of the displayed results , are encoded within 
^each-'element of the hierarchical tree structure so that the 

may navigate to a specific web page by clicking on any of 
^6hey' ; aiinks of the resultant rows of displayed data. The 
;p^eh;doiding is done by including a reference to a specific 

;s ess ion know by the session service along with the address to 
^%^;^e;lement in the table of results displayed during the 
Vv-'Spedific session. State information provided by the session 
4sjervice can uniquely regenerate the table of results. The 
address is a specification of the headings in the table of 
|r results. 

" For example, if an element in the hierarchical structure 
|;^^nder a subheading "3" which is under a major heading M E, " 
t; the - address would specify that the major heading is ^E" and 
f • that the subheading is "3." Response planning may also 
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^include navigation to a web page in which the user will find a 
suitable answer to their query. 

As previously described, prose is generated and added to 
-the .results . 

" ,. Referring to FIG. 6, a prose process 140 includes 
.'receiving 142 the normalized text query. The normalized text 
'query is converted 144 to prose and the prose displayed 146 to 
-the -user in conjunction with the results of the user query. 

V • /The prose process 140 receives the normalized text query 
^:|M..text frame. The text frame is a recursive data structure 
'fepntalning one or more rows of information, each having a key, 
'that identifies the information. When the text frame is 
passed to the prose process 140 it is processed in conjunction 
iSit^-' ' a P rose configuration file. The prose configuration file 
^contains a set of rules that are applied recursively to the 
'text frame. These rules include grammar-having variables 
contained within. The values of the variables come from the 
•'text > frame, so when combined with the grammar, prose is 
generated. For example, one rule may be xv there are $n 
Jprodii'cts with $product." The variables $n and $product are 
assigned values from an analysis of the text frame. The text 
frame' may indicate $n = 30 and $product = leather. Thus, the 
prps'e that results in being displayed to the user is "there 
.Jare, 30 products with leather ." 
, More than one rule in the prose configuration file may 
match the text frame. In such a case, prose process 140 will 
•recursively build an appropriate prose output. In addition, 
if... two rules in the prose configuration file match 
^'identically, the prose process 140 may arbitrarily select one 
.:.©.f the two rules, but the database can be weighted to favor 
oh%;' rule over another. In some cases, default rules may 
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" a ? Ply * In add ition, some applications may skip over keys and 
, may use rules more than once. 

:>->.:^ he prose configuration file also contains standard 
functions, such as a function to capitalize all the letters in 
?;'' a ;:'^ tle ' 0ther Actions contained within the prose 
.configuration may pass arguments. 

:;The information access process 40 (of FIG. 2) interfaces 
Wlt ?^ a : number of configuration files in addition to the prose 
configuration file. These configuration files aid the 
^ information access process 40 in processing queries with the 
y.M^ current data contained in the main server database. For 

example, , the information access process 40 has a bootstrapping 
\ al3iMtY to mana ^ e changes to a web page of the main server and 
!^p; : the,.main server database. This bootstrapping ability is 
. ..needed so that when the main server database changes occur, 

\ the ; m0St current . files are utilized by the information access 
.^process 40. 

^ he information access process 40 also includes a number 
of , : j-ools that analyze the main server database and build 
1|in^|al versions of all of the configuration files, like the 
pprpse configuration file; this is generally referred to as 
^bootstrapping, as described above. Bootstrapping gives the 

information access process 40 -genuine" knowledge of how 
|?|f? tiar rules for items searching looks like, specific to the 
; A main -server database being analyzed. 

R ^ ferrin <? to FIG. 7, a bootstrap process 170 extracts 172 
1^11- fcext corresponding to keys and values from the main server 
' database. The extracted text is placed 174 into a feature 
l^e^on. A language lexicon is updated 176 using a general 
Wempiing process. Grammar files are augmented 178 from the 
-extracted keys and values. Generic grammar files and 
/ previously built application-specific grammar files are 

- 14- 
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consulted 180 for rule patterns, that are expanded 182 with 
.thg... newly extracted keys and values to comprise a full set of 
^littpiaatically generated grammar files. 

:.\ :> .'For example,, if an application-specific grammar file 
^p^ctf ies that. "Macintosh" and "Mac" parse to the same value, 
.tany? extracted values containing "Macintosh" or "Mac" will be 
lai^omatically convert into a rule containing both "Macintosh" 
fean#^Mac." The structuring of the set of grammar files into 
Igleiteric, application-specific and site-specific files allows 
for^ automatic, generation of new grammar files from the 

v^ain, server database. The bootstrapping process 170 can build 
^/theK.logic and prose configuration files provided that a system 

developer has inputted information about the hierarchy of 
•;|prc^dUGts covered in the main server database. 

}•-■ ■' The hierarchy for a books database, for example, may 
^inciu:de a top-level division into "fiction 77 and "nonf iction, " 

fiction, the various literary genres might form the 
s-next. level or subdivision, and so forth. With knowledge of 
|#^# ■. hierarchy, the bootstrapping process 170 configures the 

-.files through link linguistic concepts relating to 
;lent?:; t ies in the hierarchy with products in the main server 
<^a§abase, so that the logic is configured to recognize, for 
l^ekample, that "fiction'' refers to all fiction books in the 
■^tipvjcs. database. The logic configuration files are also 
aLUtoiaatically configured by default, and summarization and 
t^^nization of the results uses all levels of the hierarchy. 
^The;;prose configuration files are automatically generated with 
riifes specifying that an output including, for example, 
^;iii^|tery novels, should include the category term "mystery 
/|:in^vels" from > the hierarchy. The bootstrapping process 170 may 
vaisp. "spider" 184 a main server database so as to build a 
.language lexicon of the site, e.g., words of interest at the 
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;,,Sit.^. ; . This helps building robust configuration files. 

Spidering refers to the process of having a program 
^ automatically download one or more web pages, further 
I downloading additional pages referenced in the first set of 
V; pages, and repeating this cycle until no further pages are 
? referenced or until the control specification dictates that 
;£heV;£urther pages should now be downloaded. Once downloaded, 
.-.further processing is typically performed on the pages. 
^Specifically, the further processing here involves extracting 
^terms appearing on the page to build a lexicon. 
§S5^'* hen the boots trapping process 170 executes after 
|;0|^g^nal configuration files have been generated, the original 
configuration files are compared with the current 
configuration files and changes added incrementally as updates 
:,, to the, original configuration files. 

^^/ ; .iiv deferring again to FIG. 3, the information interface 80 
^includes the database aliasing process 88. The database 
^^aliasing process 88 provides a method to infer results when no 
i t :dlre.ct match occurs. Referring to FIG. 8, a database (db) 
-aliasing process 200 includes generating 202 and aliasing the 
and applying 204 the aliasing file to a user query. The 
^ .-automatic generation of the database aliasing file reduces the 
^a^uiit of initial development effort as well as the amount of 

ongoing maintenance when the main server database content 
,'; . changes. 

iy v Referring to FIG. 9, a database aliasing file generating 
^mfffsa 220 includes extracting 222 names from the main server 

.database. The extracted names are normalized 224. The 
| normalized names are parsed 226, The language lexicon is 
applied 228 to the normalized parsed names. A determination 
230 is made on whether multiple normalized names map to any 
-Single concept. If so, alias entries are stored 232 in the 
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^database aliasing file. In this manner, the grammar for the 
' iparser can be leveraged to produce the' database aliasing file. 
pjfii,si reduces the need for the system developer to input 

synonym information in multiple configuration files and also 
allpws imprecise aliases, which are properly understood by the 
parser, to be discovered without any direct manual entry. 
;« jpH- The db aliasing file, like many of the configuration 
••-^f^fesV is generated automatically, as described with reference 
•-••' •to,.- FIG. 9* It can also be manually updated when the context 
;; i >^.f.-..;:the i database under investigation changes. The database 
>a'Iias:ing file is loaded and applied in such a way as to shield 
:ii;ts>::operat ions from the information interface 80 of FIG. 3. 
In a particular embodiment, the application of the db 
v; aliasing file to a query can be used in two directions. More 
.; specifically, in a forward direction, when a user query is 

-received, applying the database aliasing file to the user 
'''query and resolving variations of spelling, capitalization, 
and abbreviations, normalized the user query, so that a 
normalize query can be used to search the main server 
./database. In a reverse direction, if more than one alias is 
found, the search results will normalize on a single name for 
f$|## item - rather than all possible aliases found in the main 
' server database file. 

-Referring again to FIG. 4, the information interface 80 
^•;^lh&ludes the information retrieval (IR) process 82. The 

Information retrieval process 82 purpose is to take a 
-V^°^:^ ction of documents on a main server database containing 
|^ r WQ^cisv generate an inverse index known as an IR index, and use 
gJth^^fLR index to produce answers to a user query. The 
| ^ioxmation access process 40 (of FIG* 2) leverages grammar it 

develops for front end processing when building the IR index 
. to generate phased synonyms (or phrased aliases) for the 
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document. More specifically, the information access process 
40. applies the parser and grammar rules to the document before 
the IR index is built. The effect of this can be described by 
way of example. One rule may indicate the entity "laptop" 
J?^,to -^laptop" or -notebook." Thus, during parsing, if 
Notebook" is found, it will be replaced by the entity 
laptop, " which then gets rolled into the IR index. 
• . ; At search time, the -information access process 40 
attempts to find documents containing the search terms of the 
.Bger v^uery, and in addition, the incoming user search terms 
Jplf^un- through, the parser, that will find multiple entities, 
H':^bey exist, of the same term. Thus, combining the parser 
fnd,the grammar rules, the information access process 40 maps 
abuser query into its canonical form of referring to the item. 
" Tlle information retrieval process 40 may also process a 
I grammar and generate a grammar index, which can help find 
(^0%er .phrased synonyms that other methods might not find. For 
example, -Xeon", an Intel Microprocessor whose full 
designation is the -Intel Pentium Xeon Processor," may be 
V represented in canonical form as -Intel Xeon Processor." If a 
user ; query is received for -Intel," -Xeon" would not be found 
without the grammar index .of the information access process 
r*6v The information access process 40 will search the grammar 
'Jihd^x and produce a list of all grammar tokens containing 
j-Tntel, " and add this list to the overall search so that the 
Results would pick up -Xeon," among others. 
'■<;p;,&:>i$he use of the parser and grammar rules to specify the 
expansion of a full user query to include synonyms allows for 
c|ntralization of linguistic knowledge within the grammar 
rules, removing a need for additional manual configuration to 
gain the query expansion functionality. 
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Referring to FIG. 10, a query expansion process 250 
includes normalizing 252 and parsing 254 the punitive text, 
^evcaiionical non-terminal representations are inserted 256 
i^tb:^an IR index in place of the actual punitive text. 

"In an embodiment, the punitive text is used "as- is." 
However, when a search is requested by a user, the punitive 
search phrase is processed according to the grammar rules to 
'^Wf* a canonical non- terminal representation. The grammar 
flfft- are then used in a generative manner to determine which 
.other possible phrases could have generated the same canonical 
•non-terminal representation. Those phrases are stored in the 
•IR index. 

gjg^The "as-is" method described above is generally slower 
andjiess complete in query expansion coverage, because it may 

; : takfrtoo long to generate all possible phrases that reduce to 
the .same canonical non-terminal representation, so a 
truncation of the possible phrase list can occur. However, 

. tK« "as-is" method has the advantage of not requiring re- 

: indexing the original text whenever the grammar rules are 

yjap^teed. 

|;^ : . : ;*n a particular embodiment, the information access 
im^ss 40 (of FIG. 2) combines an IR index search with a main 
•s|rver database search to respond to queries that involve a 
.combination of structured features stored in a database (e.g., 
flf?t^' color) and unstructured information existing in free 
'texiv.:u structured Query Language (SQL) is used to interface to 
a* standard relational database management system (RDBMS) . To 
jointly search an RDBMS and an IR index, the information 
access process 40 issues an unstructured search request to the 
IR.jLr.dex, uses the results, and issues a SQL query, which 
deludes a restriction to those initial IR index search 
results. However, the free text information in the IR index 
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. may.not .always correspond to individual records in the RDBMS . 
general, there may be many items in the IR index which 

-correspond to categories of items in the RDBMS. In order to 
improve the efficiency of searches involving such items in the 
IR index, the IR index is further augmented with category 
hierarchy information. Thus, a match to an item in the IR 

KSSff WiU alS ° retrieve corresponding category hierarchy 

. information, which can then be mapped to multiple items in the 

. RDBMS . ' 

& } >zi • - : ThS informati ° n access, process 40 parser contains the 
, capability of processing large and ambiguous grammar 
• efficiently by using a graph rather than -pure" words. The 
|%P^ser allows the information access process 40 to take the 
. grammar file and an incoming query and determine the query' s 
.structure. Generally, the parser pre-compiles the grammar 

into a binary format. The parser then accepts a query as 
\ iT } p ^. text, processes the query, and outputs a graph. 
^^•••, LR parsin S is currently one of the most popular parsing 
; techniques for context-free grammars. LR parsing is generally 
.referred to as -bottom-up" because it tries to construct a 
; ; par^e.tree for an input string beginning at the leaves (the 
i^lj&ma and working towards the root (top) . The LR parser 
"scants ; the input string from left to right and constructs a 
fright .most derivation in reverse. 

.The information access process 40 improves on the LR 
^parser by adding the ability to handle ambiguous grammars 
efficiently and by permitting the system developer to include 
/.regular, expressions on the right hand side of grammar rules. 
vln'',the -standard" LR parser, an ambiguous grammar would 
..produce a conflict during the generation of LR tables. An 
' ambiguous grammar is one that can interpret the same sequence • 
of' words as two or more different parse trees . Regular 
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^expressions are commonly used to represent patterns of 
alternative and/or optional words. For example, a regular 
expression xx (aib)c+" means one or more occurrences of the 
-letter vx c" following either the letter "a" or the letter "b." 
In traditional LR parsing, a state machine, typically 
represented as a set of states along with transitions between 
the "states, is used together with a last- in first-out (LIFO) 
'S:tack* The state machine is deterministic, that is, the top 
^Sy^ol on the stack combined with the current state specifies 
5^^riiusively what the next state should be. Ambiguity is not 
^-:^|ipj>orted in traditional LR parsing because of the 
^feerministic nature of the state machine. 

To support ambiguity the information access process 40 
(.extends the LR parser to permit non-determinism in the state 
^machine, that is, in any given state with any given top stack 
'•^i^O'l 7 more than one successor state is permitted. This non- 
defeferminism is supported in the information access process 40 
,^i;th; the use of a priority queue structure representing 
^••miiifeiple states under consideration by the parser. A priority 
4?3^iuej3:e is a data structure that maintains a list of items 
^'^^Mied-.by a numeric score and permits efficient additions to 
V^land'! deletions from the queue. Because- the parser used in the 
^'i^iftjto-Eination access process 40 is permitted to be 

in multiple states, the parser tracks multiple 
'stacks, one associated with each current state- This may lead 
t;o,;inef ficiency. However, since the multiple concurrent states 
tend ; to have a natural *tree" structure, because typically one 
s-iate transitions to a new set of states through multiple 
putative transitions, the multiple stacks can be structured 
much more efficiently in memory usage via a similar tree 
organization. 

In a traditional LR parser, the state diagram can be very 
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large even for moderate size grammars because the size of the 
state diagram tends to grow exponentially with the size of the 
grammar. This results in tremendous memory usage because 
^ grammars suitable for natural language tend to be much larger 
than those for a machine programming language. In order to 
| ; . improve the efficiency of the state diagrams, the information 
; access process 40 makes use of empty transitions that are 

' as " e Psilon" transitions. The exponential increase in 
:| r ze- .occurs because multiple parses may lead to a common rule 
v^n^thg grammar, but in a deterministic, state diagram, because 
state representing, the common rule needs to track which of 
.;numerous possible ancestors was used, there needs to be one 
. •state of each possible ancestor. However, because the 

•4-^¥:prmation access process 40 has expanded the LR parser to 
^§uppprt ambiguity via support for a non-deterministic state 
'•..diagram, the multiple ancestors can be tracked via the 
V previously described priority queue/stack tree mechanism. 
• .Thusy : .a common rule can be collapsed into a single state in 
^on-deterministic state diagram rather than replicated 
; mul;tiple times. In general, performing this compression in an 
^|J|!pial fashion is difficult. However, a large amount of 
.,C6 ; fct>'ression can be achieved by inserting an epsilon whenever 
'•vthe right-hand side of a grammar rule recourses into a non- 
terminal.. This has the effect of causing all occurrences of 
.the- same non-terminal in different right-hand- sides to be 
v c6;llapsed in the non-deterministic state diagram. A concern 
. which the information access process 40 addresses is that any 
^letft-recursion," that is, a rule which eventually leads to 
itsejlf either directly or after the application of other 
; rules,, will result in a set of states in the non-deterministic 
,'s.|^te diagram that can be traversed in a circular manner via 
epsilon transitions. This would result in a potential, infinite 
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processing while parsing. In order to prevent infinite 
prpcessing, if there are multiple possible epsilon transitions 
fm-^es, they are reduced to a single" epsilon transition. 

This may result in a small amount of inaccuracy in the parser, 
|- but avoids, the potential for infinite processing. 

........ : The parser of the information access process 40 has also 

^been expanded to support regular expressions on the right- 
•|||i5Side of context-free grammar rules. Regular expressions 
,cah /always be expressed as context-free rules, but it is 
jtedious for grammar developers to perform this manual 
expansion, increasing the effort required to author a grammar 
^and the chance for human error. Implementation of this 
^ ./extension would be to compile the regular expressions, into • 
e context-free rules mechanically and integrate these rules into 

"he larger set of grammar rules. This can be accomplished by 
-converting regular expressions into finite state automata 
I ^through generally known techniques, and then letting a new 
| non-terminal represent each state in the automata. However, 
' thi-s. approach results in great inefficiency during parsing 
#|f?f¥ se of th e large number of newly created states. Also, 
i ^/expansion results in parse trees which no longer 
^correspond to the original, unexpanded, grammar, hence, 
increasing the amount of effort required by the grammar 
^developer to identify and correct errors during development. 

S'v^v^ 1 alternative used by the information access process 40 
||€S\*P- follow the finite state automaton corresponding to a 
* regular expression during the parsing as if it were part of 
*.f, the overall non-deterministic state diagram. The difficulty 
'^a^arises is that right-hand-sides of grammar rules may 

correspond to, both regular expressions of terminal and non- 
terminal symbols in the same rule. Thus, when the LR parser of 
the information access process 40 reaches a reduce decision, 
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|g^^,ls no longer a good one-to-one correspondence between 
^stack symbols and the terminal symbols recently processed, 
-technique needs to be implemented in order to find the start 
J: ; of .the right-hand side on the stack. However, because the 
| .Pa.;r. se r uses epsilons to mark recursions to reduce the state 
■,;di|gr,am size, the epsilons also provide useful markers to 
■ ideate on the stack when non-terminals were pursued. With 
; i-th0;.. information, the LR parser of the information access 
process 40 Is able to match the stack symbols to the terminals 
; :'.i n -; the input text being parsed. 

Ilfc'"' An ° ther efficiency of the LR parser of the information 
access process 40 involves the & ixity to support * hints „ in 

the. grammar. Because natural language grammars tend to have a 
\ aitlount of ambi^ity, and ambiguity tends to result in 
•^uch, lengthier parsing times. In order to keep the amount- of 
|||§rs.ing time manageable, steps must be taken to -prune" less 
: .. premising putative parses. However, automatic scoring of 
/parses, for their -promise" is non-trivial. There exist 
jj probabilistic techniques, which require training data to learn 
• probabilities typically associated with each grammar rule. The 

LR parser of the information access process 40 uses a 
V technique that does not require any training data. A grammar 
l^ieyefeper is allowed to insert -hints," which are either 
^||te;rs in the grammar rules with associated -penalty costs" 
r or ^-anchors." The penalty costs permit the grammar developer 

;-tp -instruct the LR parser of the information access process 40 
^.tp_favor certain parses over others, allowing for pruning of 
'^s|^favored parses. Anchors indicate to the LR parser that 
I a:^ votber putative parses that have not reached an anchor 
should be eliminated. Anchors thus permit the grammar 
developer to specify that a given phrase has a strong 
likelihood of being the correct parse (or interpretation), 
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hence, all other parses are discarded. 

Another concern with supporting ambiguous grammars is 
^th^t;. the large number of parses consumes much memory to 
.^present. The LR Parser of the information access process 40 

^modified to represent a list of alternative parse trees in 
, ^ graph structure. In the graph representation, two or more 
£arse trees, which share common substructure within the parse 
^r.ee >; are represented as a single structure within the graph. 
■ The ed.ges in the graph representation correspond to grammar 
^u|es. A given path through the graph represents a sequential 
t ...application of a series of grammar rules, hence, uniquely 
> Identifying a parse tree. 

ggl^'-CiOnce a graph representation' of potential parses is 

generated, at the end of parsing a frame representation, of the 
;^f^ Va;nt P° tent ial parses is outputted. This is achieved via a 
Ijfp^step method. First, the graph is converted into a series 
^df^dutput directives. The output directives are specified 
-Within' the grammar by the grammar developer. Second, frame 
I.e. generation occurs as instructed by the output directives. The 
;f%s£ .step is complicated by the support for regular 
f x |£ss.sions within the grammar rules because a node in the 
;t :?ff Sfe <t:ree may correspond to the application of a regular - 
^:: ; e^prfe. S sion consisting of non-terminals, which in turn 
corresponds to application of other grammar rules with 
associated output directives. The identity of these non- 
.te.ntiinals is not explicitly stated in the parse tree. In order 
Jfc#l«seover these identities, during the first step, the 
.' process follows a procedure very similar to the previously 
..'•..described LR parser, but instead, because one already has a 
; : pars.e tree, the parse tree is used to "guide" the search 

contro.l strategy. Once the proper identities are discovered, . 
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the corresponding output directives are sent to the second 
'stage. . . . . 

•"'%">;'l : ' The information interface 80 frequently needs to access 
Multiple tables in an RDBMS in order to fulfill a data request 
made by the control logic of the information access process 
.40.. It is unwieldy for the system developer to specify rules 
on which tables need to be accessed to retrieve the requested 
information. Instead, it is much simpler for the system 
developer to simply specify what information is available in 

: whic?l>, tables. Given this information, the information 
interface 80 finds the appropriate set of tables to access, 
■and/correlates information among the tables. The correlation 
is?; carried out: by the information interface 80 (of FIG . 4) 
requesting a standard join operation in SQL. 

i i In order to properly . identify a set of tables and their 
respective join columns, the information interface 80 (of FIG. 
4) views the set of tables as nodes in a graph and the 
potential join columns as edges in a graph. Given this view, a 
standard minimum spanning tree (MST) algorithm may be applied. 
However-,- the input to the information interface 80 is a 

/■request based on features and not on tables. In order to 

• identify the tables and join columns, the information 
•interface 80 treats the set of tables as nodes in a graph and 

0Sy^ ° f j ° in columns as ed ^ e s in the graph. A standard 
..minimum spanning tree (MST) algorithm can be applied. One 
problem is that the same feature may be represented in more 
than; -one table. Thus, there may be multiple sets of tables 
that, can potentially provide the information requested. In 
ojde* tp identify the optimal set of tables and join columns, 

• the information interface 80 must apply a MST algorithm to 
.each possible set of tables. Because the number of possible 
sets can expand exponentially, this can be a very time 
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churning pro = e ss. The information interface 80 also has the 
•% Uty t0 make an approximation as follows. There is a 
subset, which may be zero, one, or more, of features, which 
are represented in only one table per feature. These tables 
therefore are a mandatory subset of the set of tables to be 
accessed. In the approximation, the information interface 80 
?.r;rst applies a MST algorithm to the mandatory subset, and 

■.-thSffl. expands the core subset so as to include all the 
revested tables; The expansion seeks to minimize the number 

•of. additional joins needed to cover each feature not covered 

by. the mandatory subset. 

.Other embodiments are within the following claims. 
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;^HAT. IS CLAIMED IS: 

. 1, A computer- implemented method of accessing 
^information , . comprising: 

processing a query; 
:/;, > . searching a collection of data for a set of results 

^ patching the query; 

selectively reducing the set of results to generate 
a /subset of results; 

>$0^.a\\- ' . outputting a prose, rendition of the query; and 
' outputting the subset of results* 

2. The computer- implemented method of claim 1 wherein 
; - processing the query comprises.: 

'':•"'•>«';. :.,y-,y •■ . , . 

<£ t '\ . parsing the query to generate a search fragment; and 

adding context to the search fragment. 

,/ : 3* The computer- implemented method of claim 2 wherein 
^^i&g context comprises extracting data from a web page from 
a |ffwh;^Gh the query was received. 



4, The computer-implemented method of claim 1 wherein 
the processing the query comprises: 
,..;>•/:; . normalizing text of the query; 

parsing the text; and 

providing meaning to the text. 



5. # The computer- implemented method of claim 4 wherein 
|! : • / %hfe' -.processing the query further comprises associating context 
f -wit-h the text. 
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6. The computer- implemented method of claim 1 wherein 
the selectively reducing comprises: 

placing the set of results in a hierarchical data 
structure organized by taxonomy; and 

discarding results positioned at a lowest level- of 
.the hierarchical data structure. 

7. The computer- implemented method of claim 1 wherein 
^he, ; .Output t i ng the prose rendition comprises: 

zfy'gi / processing the query in conjunction with rules of 
5g|;ammar; and 

;|fV; processing the query in conjunction with a prose 

^cbnCiguration file* 

. 8.. The computer- implemented method of claim 1 wherein 
|:he' outputting of the subset comprises: 
placing the subset in a table. 

^.'•^ 9. The computer- implemented method of claim 5 further 
SG^Iaprising customizing the table to the query. 

V " / "■ 

i^'h'-'::/' 10. A computer program, residing on a computer- readabl 
medium, comprising instructions for causing a computer to: 
• ; : process a query; 

y\C> r ': search a collection of data for a set of results 
^matching the query/ 

selectively reduce the set of results to generate 
subset of results; 

output a prose rendition of the query; and 
output the subset of results. 
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